Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models

نویسندگان

  • Pratyush Banerjee
  • Sudip Kumar Naskar
  • Johann Roturier
  • Andy Way
  • Josef van Genabith
چکیده

Supplementary data selection from out-of-domain or related-domain data is a well established technique in domain adaptation of statistical machine translation. The selection criteria for such data are mostly based on measures of similarity with available in-domain data, but not directly in terms of translation quality. In this paper, we present a technique for selecting supplementary data to improve translation performance, directly in terms of translation quality, measured by automatic evaluation metric scores. Batches of data selected from out-of-domain corpora are incrementally added to an existing baseline system and evaluated in terms of translation quality on a development set. A batch is selected only if its inclusion improves translation quality. To assist the process, we present a novel translation model merging technique that allows rapid retraining of the translation models with incremental data. When incorporated into the ‘in-domain’ translation models, the final cumulatively selected datasets are found to provide statistically significant improvements for a number of different supplementary datasets. Furthermore, the translation model merging technique is found to perform on a par with state-of-the-art methods of phrase-table combination.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality Assessment of the Persian Translation of John Steinbeck’s Of Mice and Men Based on Waddington’s Model of Translation: Application of Method A

Considering the statement that errors can affect the quality of translations, the need to adopt an objective model to analyze these errors has been one of the most debated issues in translation quality assessment. In recent decades, some objective models have emerged with an error analysis nature according to which evaluators can make decisions on the quality of translations. In this study, Met...

متن کامل

Assessing Quick Update Methods of Statistical Translation Models

The ability to quickly incorporate incoming training data into a running translation system is critical in a number of applications. Mechanisms based on incremental model update and the online EM algorithm hold the promise of achieving this objective in a principled way. Still, efficient tools for incremental training are yet to be available. In this paper we experiment with simple alternative ...

متن کامل

Evaluation of the Validity and Reliability of a Communicative Scale for Translation Quality Assessment

The present study assessed the construct validity and reliability of a researcher-constructed psycho-motor mechanism scale based on the communicative theory of translation proposed by PACTE (2003). In doing so, the necessary criteria for designing the scale were obtained by a thorough review of related literature on previously constructed scales in error analysis or holistic ones. Moreover, in ...

متن کامل

Evaluation of the Validity and Reliability of a Communicative Scale for Translation Quality Assessment

The present study assessed the construct validity and reliability of a researcher-constructed psycho-motor mechanism scale based on the communicative theory of translation proposed by PACTE (2003). In doing so, the necessary criteria for designing the scale were obtained by a thorough review of related literature on previously constructed scales in error analysis or holistic ones. Moreover, in ...

متن کامل

Evaluation of the Validity and Reliability of a Communicative Scale for Translation Quality Assessment

The present study assessed the construct validity and reliability of a researcher-constructed psycho-motor mechanism scale based on the communicative theory of translation proposed by PACTE (2003). In doing so, the necessary criteria for designing the scale were obtained by a thorough review of related literature on previously constructed scales in error analysis or holistic ones. Moreover, in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012